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Abstract 

Compilation for embedded processors can be either aggressive 
(time consuming cross-compilation) or just in time (embedded and 
usually dynamic). The heuristics used in dynamic compilation are 
highly constrained by limited resources, time and memory in par- 
ticular. Recent results on the SSA form open promising directions 
for the design of new register allocation heuristics for embedded 
systems and especially for embedded compilation. In particular, 
heuristics based on tree scan with two separated phases — one for 
spilling, then one for coloring/coalescing — seem good candidates 
for designing memory-friendly, fast, and competitive register allo- 
cators. Still, also because of the side effect on power consumption, 
the minimization of loads and stores overhead (spilling problem) is 
an important issue. This paper provides an exhaustive study of the 
complexity of the "spill everywhere" problem in the context of the 
SSA form. Unfortunately, conversely to our initial hopes, many of 
the questions we raised lead to NP-completeness results. We iden- 
tify some polynomial cases but that are impractical in JIT context. 
Nevertheless, they can give hints to simplify formulations for the 
design of aggressive allocators. 

* Categories and Subject Descriptors: D.3.4 [Programming Lan- 
guages]: Processors — Code generation, Optimization; F.2.0 [Anal- 
ysis of Algorithms and Problem Complexity] 

* General Terms: Algorithms, Performance, Theory. 

* Keywords: Register allocation, SSA form, Spill, Complexity. 

1. Introduction 

Register allocation is one of the most studied problems in compila- 
tion. Its goal is to map the temporary variables used in a program 
to either machine registers or main memory locations. The com- 
plexity of register allocation for a fixed schedule comes from two 
main optimizations, spilling and coalescing. Spilling decides which 
variables should be stored in memory to make possible register as- 
signment (the mapping of other variables to registers) while mini- 
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mizing the overhead of stores and loads. Register coalescing aims 
at minimizing the overhead of moves between registers. 

Compilation for embedded processors is either aggressive or 
just in time (JIT). Aggressive compilation is allowed to use a long 
compile time to find better solutions. Indeed, the program is usu- 
ally cross-compiled, then loaded in permanent memory (rom, flash, 
etc.), and shipped with the product. Hence the compilation time 
is not the main issue as compilation happens only once. Further- 
more, especially for embedded systems, code size and energy con- 
sumption usually have a critical impact on the cost and the quality 
of the final product. Just-in-time compilation is the compilation of 
code on the fly on the target processor. Currently the most promi- 
nent languages are CLI and Java. The code can be uploaded or sold 
separately on a flash memory, then compilation can be performed 
at load time or even dynamically during execution. The heuristics 
used, constrained by time and limited memory, are far from being 
aggressive. In this context there is trade-off between resource usage 
for compilation and quality of the resulting code. 

1.1 SSA Properties 

The static single assignment (SSA) form is an intermediate repre- 
sentation with very interesting properties. A code is in SSA form 
when every scalar variable has only one textual definition in the 
program code. Most compilers use a particular SSA form, the strict 
SSA form, with the additional so-called dominance property: given 
a use of a variable, the definition occurs before any uses on any 
path going from the beginning of the program (the root) to a use. 
One of the useful properties of such a form is that the dominance 
graph is a tree and the live ranges of the variables (delimited by 
the definition and the uses of a variable) can be viewed as subtrees 
of this dominance tree. A well-known result of graph theory states 
that the intersection graph of subtrees of a tree is chordal (see de- 
tails in 1131 p. 92]). Since coloring a chordal graph is easy using 
a greedy algorithm, it has the consequence for register allocation 
that the "assignment problem" 1101 p. 622] (mapping of variables 
to registers with no additional spill) is also easy. 

The fact that the interference graph of a strict SSA code is 
chordal, and therefore easy to color, leads to promising directions 
for the design of new register allocation heuristics. 

1.2 Recent Developments in Register Allocation 

Spilling and coalescing are correlated problems that are, in classical 
approaches, done in the same framework. Even if "splitting", i.e., 
adding register-to-register moves, is sometimes considered in such 
a framework, it is very hard to control the interplay between spilling 
and splitting/coalescing. The properties of SSA form has led to new 



approaches where spilling and coalescing are treated separately: the 
first phase of spilling decides which values are spilled and where, 
so as to get a code with Maxlive < k where Maxlive is the maximal 
number of variables simultaneously live and k is the number of 
available registers. The second phase of coloring (assignment), 
maps variables to registers with no additional spill. When possible, 
it also removes move instructions, also called shuffle code in 11181 . 
due to coalescing. This is the approach advocated by Appel and 
George (T) and, more recently, in (6][T7][4][5). The interest of this 
approach for embedded systems is twofold. 

1. Because power consumption has to be minimized, it is very im- 
portant to optimize memory transfers and thus design heuristics 
that spill less. This new approach allows to design much more 
aggressive spilling algorithms for aggressive compilers. 

2. For JIT compilation, this approach allows to design very fast 
spilling heuristics. In a graph coloring approach (5), the spilling 
decision is subordinate to coloring. On the other hand, when the 
spilling phase is decoupled from the coloring/coalescing phase, 
i.e., when one considers better to avoid spilling at the price 
of register-to-register moves, then testing if spilling is required 
simply relies on checking that the number of simultaneous live 
variables (register pressure) is lower than k. This simple test 
can be performed directly on the control flow graph and the 
construction of an interference graph can thus be avoided. This 
point is especially interesting for JIT compilation since building 
an interference graph is not only time consuming |9), but also 
memory consuming (7). 

The second advantage of the dominance property under SSA 
form is that the coloring can be performed greedily on the control 
flow graph. The principle for coloring a program under SSA form 
can be seen as a generalization of linear scan. 

Linear scan: In a linear scan algorithm, the program is mapped to 
a linear sequence. On this sequence, the live range of a variable is 
an union of intervals with gaps in between. The sequence is scanned 
from top to bottom and, when an interval is reached, it is given an 
available color, i.e., not already used at this point. In Poletto and 
Sarkar's approach 1 19], each variable is pessimistically represented 
by a unique interval that contains all the effective intervals (the gaps 
are "filled"). It has the negative effect of overestimating the register 
pressure between real intervals but it ensures that all intervals of the 
same variable are assigned the same register. In some way, Poletto 
and Sarkar's algorithm provides a "color everywhere" allocation, 
i.e., it does not perform any live-range splitting. Allowing the 
assignment of different colors for a given variable requires shuffle 
code 1201 1211 to be inserted afterwards to repair inconsistencies. 
Such a repairing phase requires additional data-flow analysis that 
might be too costly in JIT context. 

Tree scan: Coloring a program under SSA can be seen as a tree 
scan: the program is mapped on the dominance tree, live ranges 
are subtrees. The dominance tree is scanned from root to leaves 
and when an interval is reached it is given an available color. 
Here the liveness is accurate and there is no need for gap filling 
or additional live range splitting. Replacing ^-functions by shuffle 
code does not require any global analysis. In other words, tree scan 
is a generalization of linear scan. 

1.3 Spill Everywhere 

As already mentioned, the dominance property of SSA form sug- 
gests promising directions for the design of new register allocation 
heuristics especially for JIT compilation on embedded systems. 
The motivation of our study was driven by the hope of design- 
ing both fast and efficient register allocation based on SSA form. 
Notice that answering whether spilling is necessary or not is easy 



— even if there can be some subtleties — while minimizing 
the amount of load and store instructions is the real issue. In other 
words, if the search space is now cleanly delimited, the objective 
function that corresponds to minimizing the spill cost has still some 
open issues. So the question is: Is it easier to solve the spilling 
problem under SSA? In particular is the spill everywhere problem 
simple under SSA form? 

The spilling problem can be considered at different granularity 
levels: the highest, so called spill everywhere, corresponds to con- 
sidering the live range of each variable entirely. A spilled variable 
will then lead to a store after the definition and a load before each 
use. The finer granularity, so called load-store optimization, corre- 
sponds to optimize each load and store separately. The latter prob- 
lem, also known as paging with write back, is NP-complete 1111 
on a basic block even under SSA form. The former problem is 
much simpler, and a well-known polynomial instance J2) exists un- 
der SSA form on a basic block. To develop new spilling heuristics, 
studying the complexity of spilling everywhere is very important 
for the design of both aggressive and JIT register allocators. 

1. First, the complexity of the load-store optimization problem 
comes from the asymmetry between loads and stores 1111 . The 
main difference between the load-store optimization problem 
and the spill everywhere problem comes from this asymmetry. 
We have measured that, in practice, most SSA variables have 
only one or two uses. So, it is natural to wonder whether this 
singularity makes the load-store optimization problem simpler 
or not. The extreme case with only one use per variable is equiv- 
alent to the spill everywhere problem. More generally, even in 
the context of a traditional compiler, the spill everywhere prob- 
lem can be seen as an oracle for the load-store optimization 
problem to answer whether a variable should be stored or not. 
In the context of aggressive compilation 11511141 . a way to de- 
crease the complexity is to restore the symmetry between loads 
and stores as done in fTFl 

2. Second, spill everywhere is a good candidate for designing 
simple and fast heuristics for JIT compilation on embedded 
systems. Again, in this context, the complexity and the footprint 
of the compiler is an issue. Spilling only parts of the live 
ranges, as opposed to spilling everywhere, leads to irregular 
live range splitting and the insertion of shuffle code to repair 
inconsistencies, in addition to maintaining liveness information 
for coalescing purpose. All of this is probably too costly for 
some embedded compilers. 

Studying the complexity of the spill everywhere problem in the 
context of SSA form is thus important to guide the design of both 
aggressive and JIT register allocation algorithms. This the goal of 
this paper. To our knowledge this is the first exhaustive study of this 
problem in the literature. 

1.4 Overview of the paper 

The rest of paper is organized as follows. For our study, we consid- 
ered different variants of the spilling problem. Section [2] provides 
the terminology and notation that describe the different cases we 
considered. Section[3]considers the simplified spill model where a 
spilled variable frees a register for its whole live range; we provide 
an exhaustive study of its complexity under SSA form. Section [4] 
deals with the problem where a spilled variable might still need to 
reside in a register at its points of definition and uses. Here, the 
study is restricted to basic blocks as it is already NP-complete for 
this simple case. Section[5]summaries our results and concludes. 



1 In this formulation, a variable might be either in memory location or in a 
register, but cannot reside in both. 



2. Terminology and Notation 

Context: For the purpose of our study, we consider different con- 
figurations depending whether live ranges are restricted to a basic 
block or not. Indeed, on a basic block, the interference graph is an 
interval graph, while for a general control flow graph, under strict 
SSA form, it is chordal. We also consider whether the use of an 
evicted variable in an instruction requires a register or not. If not, 
spilling a variable corresponds to decreasing by one the register 
pressure on every points of the corresponding live range. Other- 
wise, spilling a variable does not decrease the register pressure on 
program points that use it: in that case, instead of having the effect 
of removing the entire live range, spilling a variable corresponds to 
removing a version of the live range with "holes" at the use and def- 
inition points. We denote those two problems respectively as with- 
out holes or with holes. Finally, we distinguish the cases where the 
cost of spilling is the same for all variables or not. We denote those 
two problems respectively as unweighted (denoted by w(v) = 1 for 
all v) or weighted (denoted by w t 1). 

Decreasing Maxlive: As mentioned earlier the goal of the spilling 
problem is simply to lower the register pressure at every program 
point, while the corresponding optimization problem is to minimize 
the spilling cost. At a given program point, the register pressure is 
the number of variables alive there. The maximum over all program 
points, usually named Maxlive, will be denoted by Si here. Let us 
denote by r the number of available registers. Hence formally, the 
goal is to decrease Si by spilling some variables. If we denote by Si' 
the register pressure after this spilling phase, we distinguished the 
following four problems: Si' < £1 — 1, CI' < £1 — k where k is a 
constant, SI' < k where £ is a constant, and the general problem 
SI' < r where there is no constraint on the number of registers r. 

A graph problem: The spill everywhere problem without holes 
can be expressed as a node deletion problem 1221 . The general 
node deletion problem can be stated as follows: "Given a graph 
or digraph G find a set of nodes of minimum cardinal, whose dele- 
tion results in a subgraph or subdigraph satisfying the property n." 
Hence, the results of the first section have a domain of application 
not only on register allocation but also on graph theory. For this 
reason, we formalize them using graphs (properties of the interfer- 
ence graphs) instead of programs (register pressure on the control 
flow graph) while the algorithmic behind is actually based on the 
control flow graph representation. 

Perfect graphs: Perfect graphs fL3l have some interesting prop- 
erties for register allocation. In particular, they can be colored in 
polynomial time, which suggests that we can design heuristics for 
spilling or coalescing in order to change the interference graph into 
a perfect graph. For a graph G, the maximal size of a complete 
subgraph, i.e., a clique, is the clique number aj(G). The minimum 
number of colors needed to color G is the chromatic number x(G). 
Of course, a>(G) < x(G) because vertices of a clique must have dif- 
ferent colors. A graph G is perfect if each induced subgraph G' of G 
(including G itself) is such that^-(G') = a>(G'). A chordal graph is 
a perfect graph; it is the intersection graph of subtrees of a tree: 
to each subtree corresponds a vertex, and there is an edge between 
two vertices if the corresponding subtrees intersect. A well-known 
subclass of chordal graphs is the class of interval graphs, which are 
intersection graphs of subsequences of a sequence. 

3. Spill Everywhere without Holes 

It is well-known that, on a basic block, the unweighted spill ev- 
erywhere problem without holes is polynomial: this is the greedy 
furthest use algorithm described by Belady Q. It is less known that 
the weighted version of this problem, which cannot be solved us- 
ing this last technique, is also polynomial 123 111 II : the interference 



graph is an intersection graph for which the incidence matrix is to- 
tally unimodular and the integer linear programming (ILP) formu- 
lation can be solved in polynomial time. This property holds also 
for a path graph, which is a class of intersection graphs between 
interval graphs and chordal graphs. We recall these results here for 
completeness. We also recalled earlier that, under SSA form, once 
the register pressure has been lowered to r at every program point, 
the coloring "everywhere" problem (each variable is assigned to a 
unique register) is polynomial. 

The natural question raised by these remarks is whether the 
spill everywhere problem without holes is polynomial or not. In 
other words, does the SSA form make this problem simpler? The 
answer is no. A graph theory result of Gavril and Yannakakis 1231 
shows it is NP-complete, even in its unweighted version: for an 
arbitrarily large number of registers r, a program with SI arbitrarily 
larger than r, spilling everywhere a minimum number of variables 
such that Si' is at most r is NP-complete. The main result of this 
section shows more: this problem remains NP-complete even if one 
requires only Si' < Si - 1 . The practical implication of this result is 
that for a heuristic that would lower Si one by one iteratively, even 
the optimization of each separate step is an NP-complete problem|3 

Table \T\ summarizes the complexity results of spilling every- 
where (without holes). We now recall classical results and prove 
new more accurate results. Let us start with the decision problem 
related to the most general case of spill everywhere without holes. 



Problem: Spill everywhere 

Instance A perfect graph G = (V,E) with clique number Si = 
a)(G), a weight w(v) > for each vertex, an integer r, an 
integer K. 

Question Can we remove the vertices in V s c V from G with 
overall weight Yivev s w ( v ) — K such that the clique number Si' 
of the induced subgraph G' is at most rl 



Theorem 1 (Furthest First). The spill everywhere problem for an 
interval graph is polynomially solvable, with a greedy algorithm, if 
w(v) = 1 for all v even if r is not fixed. 

The algorithm behind this theorem is the well-known furthest use 
strategy described by Belady in (2). This strategy is very interesting 
for designing spilling heuristics on the dominance tree (see for 
example 1161 ). We give here a constructive proof for completeness. 

Proof: An interval graph is the intersection graph of a family of 
sub-sequences of a (graph) chain. For convenience, we denote the 
chain as B, vertices of B are called points, and sub-sequences of B 
are called variables. Consecutive points are denoted by p\, p m , 
and the set of variables is denoted by V. Once variables are removed 
(spilled), the remaining set of variables V is called an allocation. 
An allocation is said to fit B if, for each point p of B, the number 
of remaining variables intersecting p is at most r. The goal is to 
remove a minimum number of variables such that the remaining 
allocation fits B. The greedy algorithm can be described as follows: 

Step (init) Let V' = V and i = 1; 

Step 1 (find first) Let p(i) be the first point from the beginning of 
the chain such that more than r remaining variables, i.e., in V'. v 
intersect p(i)\ 

Step 2 (remove furthest) Select a variable V; that intersects p and 
ends the furthest and remove it, i.e., let V{ = V/_ £ \ j v,- } ; 

Step 3 (iterate) If V[ fits S, stop, otherwise increment ( by 1 and 
go to Step 1. 

2 Note that providing an optimal solution for each intermediate step (going 
from SI to SI - 1 , then from CI - 1 to Si - 2, and so on, until SI' = r) does 
not always give an optimal solution for the problem of going from SI to r. 
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Table 1. Spill everywhere without holes. 



Let us prove that the solution obtained by the greedy algorithm 
is optimal. Consider an optimal solution S (described by a set Vs 
of spilled variables) such that Vs contains the maximum number 
of variables v, selected by the greedy algorithm. Suppose that 5 
does not spill all of them and denote by v, the variable with 
smallest index such that v, $ Vs . By definition of p jft in the greedy 
algorithm, there are at least r + 1 variables not in (vi, . . . , v^-i} 
intersecting p(io). As 5 is a solution, there is a variable v in Vs 
(thus v j= v,' ) that intersects p(;'o). We claim that spilling W = 
Vs U{v io )\{v}, i.e., spilling v, instead of v, is a solution too. Indeed, 
for all points before p(io) (excluded), the number of variables in 
V L-i = V \ (vi, . . . , v, _i} is at most r. Since {vi, v, ) C W, 
this is true for V \ W too. Furthermore, each point p after p(i ) 
(included), intersected by v, is also intersected by v, by definition 
of v i(j . Thus, as p is intersected by at most r variables in V \ V s , the 
same is true for V \ W. Finally, this solution spills more variables v, 
than S , which is not possible by definition of S . Thus V s contains 
all variables v, and, by optimality, only those. This proves that the 
greedy algorithm gives an optimal solution. □ 

Theorem 2 (poly. ILP). The spill everywhere problem for an inter- 
val graph is polynomially solvable even ifw + 1 and r is not fixed. 

This result was pointed out by Gavril and Yannakakis in 1231 and 
used in a slightly different context by Farach-Colton and Libera- 
tore 1111 . The idea is to formulate the problem using ILP and to 
remark that the matrix defining the constraints is totally unimodu- 
lar. For the sake of completeness, we provide the formulation here. 

Proof: We use the same notations as for Theorem [TJ except that, 
now, Vi, . . . , v„ denote all variables and not only those selected 
by the greedy algorithm. Let w, be the cost of removing (spilling) 
variable v, . We define the clique matrix as the matrix C = (c p ,,) 
where c PjV = 1 if v intersects the point p and c P)V = otherwise. 
Such a matrix is called the incidence matrix of the interval hyper- 
graph and is totally unimodular (3). The optimization problem can 
be solved using the following integer linear program, where x is 
a vector with components (jr,-)i <,■<„, w is a vector with components 
(wdi<i<n, r* is a vector whose components are all equal to r, and 
vector inequalities are to be understood component-wise: 

max jw.x | Cx < r, < x < lj 

Of course, j, = means that v, should be removed while x, = 1 
means it should be kept. The matrix of the system is C with some 
additional identity matrices, which keeps the total unimodularity. □ 

The next two theorems are from Yannakakis and Gavril 1231 . 

Theorem 3 (Yannakakis). The spill everywhere problem is NP- 
complete for a chordal graph even if w(v) = 1 for each v 6 V. 

Another important result of 11231 is that the spill everywhere 
problem is polynomially solvable when r is fixed. Of course, there 
is a power of r in the complexity of their algorithm, but it means 
that if r is small, the problem is simpler. Because of this, we call 
the problem when r is fixed "spill everywhere with few registers". 



Problem: Spill everywhere with few registers (k) 
Instance A perfect graph G = (V,E) with clique number fi, a 
weight w(y) > for each vertex, an integer K, r = k is fixed. 
Question Can we remove vertices Vs £ V from G with overall 
weight Zvev, w(v) < K such that the induced subgraph G' has 
clique number Q! < rl 



Theorem 4 (Dynamic programming on non-spilled variables). The 
spill everywhere problem with few registers is polynomially solv- 
able ifG is chordal even ifw ^ 1. 

When we proved our results, we were actually not aware of 
Gavril and Yannakakis paper. Since Theorem [4] is very intuitive, 
we logically ended with the same kind of construction. For com- 
pleteness, we provide it here, with our own notations. This proof is 
constructive and the algorithm (dynamic programming on program 
points) is based on a tree traversal. It performs 0(mQ. k ) steps of 
dynamic programming, where m is the number of program points. 

Proof: A chordal graph is the intersection graph of a family V of 
subtrees of a tree T (Thm 4.8 1131 ). We call points the vertices of 
the tree T and, to distinguish the maximal subtrees T p rooted at 
each given point p from the subtrees of the family V, we call the 
latter variables. Given a point p and a set W Q V of variables, 
let W{p) be the set of variables v e W intersecting p, i.e., such 
that p belongs to the subtree v. If < r, we say that W 

fits p and that W(p) is a fitting set for p. We say that W fits a 
set of points if it fits each of these points. A solution to the spill 
everywhere problem with r registers is thus a subset W of V such 
that W fits T. It is an optimal solution if 2rew w ( v ) ls maximal. With 
these notations, W corresponds to V - V s in the spill everywhere 
problem formulation, and maximizing the cost of W is equivalent 
to minimizing the weight of V s . 

Given a subset of variables W, we consider its restriction, de- 
noted by W p , to a subtree T p : it is defined as the set of variables 
v 6 W that have a non-empty intersection with T p . Note that 
if W fits T, then its restriction W p to a subtree T p fits T p . Fur- 
thermore, if p x and p 2 are children of p in T then, because of the 
tree structure, all variables that belong to both W IH and W P2 in- 
tersect p, and all variables in W Pi intersecting p intersect also p h 
i.e., W Pj (p) = Wp(pj). These remarks ensure the following. Let W 
be a fitting set for T p and let W be a fitting set for T p . such that 
W'-Xp) = W Pj (p) (i.e., they coincide between p and p,). Then, re- 
placing W p . by W'„ in W leads to another fitting set of T p . This is 
the key to get an optimal solution thanks to dynamic programming. 

The final proof is an induction on the points p of T — from 
the leaves to the root — and on the fitting sets of those points 
F p e T p = [W C V(p); \W\ < r}. Let us denote by W max (p, F p ) a 
subset W of V that contains only variables intersecting T p , such 
that W(p) = F p , and with maximal cost. It can be built recursively 
as follows. For each child p, of p, consider all possible fitting 
sets F p . that match F p , i.e., such that F p . n V(p) = F p n V{pd 
and pick the solution such that W max (pi, F Pi ) is maximal. From 
these selected subsets, one for each p h W max (p, F p ) can be defined. 
This construction is done for each F p 6 T p . As there are at most 



V(p) k < Ci k such fitting sets for p, these successive locally optimal 
solutions can be built in polynomial time. □ 

We now address the following problem, which is a particular 
case of the more general spill everywhere problem. 



Problem: Incremental spill everywhere 
Instance A perfect graph G = (V, E) with clique number CI = 
601(G), a weight w(v) > for each vertex, an integer K. 
Question Can we remove vertices Vg Q V from G with overall 
weight 2vev w ( v ) - K such that the induced subgraph G' has 
clique number CI' < CI - 1? 

The following theorem can be seen as a particular case of 
Theoremf2] The proof is interesting since it provides an alternative 
solution to the ILP formulation for this simpler case. 

Theorem 5 (Dynamic programming on spilled variables). If G is 
an interval graph, the incremental spill everywhere problem is 
polynomially solvable, even ifw ± 1. 

Proof: Let B = {pi, . . . , p m } be a linear sequence of points, pi < pj 
if i < j, and V = {vi,. . . , v„) be a set of weighted variables, where 
each variable v, corresponds to an interval [.s(v,), e(v;)]. We assume 
that the variables are sorted by increasing starts, i.e., s(vf) < s(Vj) 
if i < j. Without loss of generality, the problem can be restricted 
to the case where any point p belongs to exactly CI variables (any 
other point can be deleted from the instance). So for each point, 
one needs to spill at least one of the intersecting variables. What we 
seek is thus a minimum weighted cover of B by the variables of V, 
which can be done thanks to dynamic programming as follows. 

Let W(pi) be the minimum cost of a cover of pi, . . . , p,. Know- 
ing all W(pj < i), it is possible to compute W(pi). Indeed, at p t , one 
must choose a variable v e V(p f ), i.e., intersecting the point p t . As v 
already covers the interval between its start s(v) and p,, we get: 

W(pi) = min (w(v) + lV(pred[i(v)])) where pred[»,] = p,-_i 

veV(pi) 

with the convention W(p) = for p < p\. W(p m ) is the minimum 
cost of an incremental spilling over the whole basic block B. The 
set V(pi) can be computed from V(p,_i) in 0(CI) operations because 
the variables are sorted by increasing starts. The overall complexity 
is thus 0(Clm). □ 

Theorem 6 (From 3-exact cover). The incremental spill every- 
where problem is NP -complete for a chordal graph even ifw(v) = 1 
for each v E V. 

Proof: As for Theorem [4] we use the characterization of a chordal 
graph as an intersection graph of a family of subtrees of a tree. We 
use the same notations. The proof is a reduction from Exact Cover 
by 3-SetS (X3C) Q21 Problem SP2]: let P be a set of 3/7 elements 
[p\,Pi,'-' ,P3n], and *V = ji'1,1'2,-- - ,v,„) a set of subsets of P 
where each subset contains exactly three elements of P. Does *V 
contains an exact cover of P, i.e., a sub-collection S Q'V such that 
every element of P occurs in exactly one member of <S? 

Let us consider an instance of X3C and define the following 
family of subtrees of a tree: the main tree T is of height 2 with one 
root point labeled p Q and 3n leaves labeled pi, p 2 , ■ ■ ■ , P3„. For each 
Vi = {p a ,Pp,Py} there is a subtree (variable) made of the root p and 
the tree points p a ,pp,p y . The number of variables intersecting po 
is m, so CI = in. Let us create as many additional variables as 
necessary (we call them non-labeled variables) so that the number 
of intersecting variables is exactly CI for each point of T. In other 
words, for a leaf pj that belongs to k subtrees v,-, we create m - k 
subtrees, each containing only pj. Given this family of subtrees 
of a tree, consider the corresponding intersection graph (which is 



chordal). We now show that this instance of X3C has a solution if 
and only if it is possible to remove (spill) at most n = K variables 
such that, for each point p, the number of remaining intersecting 
variables is at most £1 — 1. Notice that the reduction is polynomial: 
the whole number of variables is not larger than 3n x m. 

Suppose that there is a solution to the incremental spill every- 
where problem and let V s be the set of removed variables with 
\Vs\ < n. There is no non-labeled variable in Vs because CI must 
be decreased in the 3n leaves and only a labeled variable goes over 
three leaves. Hence Vs contains only labeled variables, \Vs\ = n, 
and the corresponding set of subsets <S is a covering of P. Con- 
versely, suppose that the X3C instance has a solution >S and let Vs 
be the set of corresponding subtrees. Since <S is a covering of P, 
\S\ = n and there is exactly one intersecting set in Vs for each leaf. 
So the number of remaining intersecting variables is CI - 1 for each 
leaf. As for the root po, all variables intersect it, so there is at least 
one (labeled) variable removed and the number of remaining inter- 
secting variables is at most CI — 1. In other words, Vs is a solution, 
with \ Vs\ < n, to the incremental spill everywhere problem. 

This proves that the incremental spill everywhere problem is 
NP-complete (the fact it belongs to NP is straightforward). □ 

The comparison between this last theorem and Theorem [4] is 
very interesting. Indeed, our first (false) intuition was that choosing 
which variables to remove so as to go from CI to CI - k was exactly 
the symmetric of choosing which variables to keep so as to get 
down to k. At first sight, it seemed that dynamic programming 
could be used, as for Theorem [4] to solve the incremental spill 
everywhere problem. For interval graphs, both problems can indeed 
be solved with dynamic programming as we previously showed. 
The incremental approach would have then provided a heuristic 
for the main spill everywhere problem, as an alternative to an 
exact solution as in |T), which is too expensive when r is large. 
Unfortunately, Theorem [6] contradicts this intuition. In fact, the 
two problems are not perfectly symmetric: to make the graph k- 
colorable, the number of kept variables live at any point should 
be at most k while to make a graph CI - k colorable, the number 
of removed variables live at any point must be at least k, as for the 
point p in the proof of Theorem[6] This is where the combinatorial 
complexity comes from. 

4. Spill Everywhere with Holes on a Basic Block 

The previous section dealt with the spill everywhere problem with- 
out holes. To summarize, this problem is polynomial for a basic 
block even in its weighted version whereas, most of the time, it is 
NP-complete for a general control flow graph under SSA form. As 
mentioned earlier, the model without holes does not reflect the re- 
ality of most architectures. The goal of this section is to tackle the 
problem of spill everywhere with holes on a basic block. 

Where do the holes come from? For an architecture where 
operations are allowed only between registers, whenever a variable 
is spilled, one needs to insert load instructions before the uses of 
this variable and a store instruction after its definition. This means 
that new variables appear, with very short live ranges but which 
nonetheless need to be assigned to registers. In other words, when 
a variable is spilled, the number of simultaneously alive variables 
decreases by one at every point of the live range, except where the 
variable is defined or used. Thus spilling everywhere a variable 
does not remove the complete interval, but only parts of it, since 
there is still some tiny sub-intervals left. This is why, for instance, 
in Chaitin et al. algorithm (8), the register allocation must re-build 
the interference graph and iterate if some variables are spilled. 

Holes and chads: The notion of holes can be formalized as 
follows. An SSA code on a basic block, or linear SSA code, is a pair 
C = (B, V) where B = {pi, . . . ,p m ] is a sequence of m instructions; 
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Figure 1. Example of punched intervals. 



Problem: Spill everywhere with holes 

Instance A code C = (T, V) with Maxlive CI = oj(C), a weight 
w(y) > for each variable, integers r and K. 
Question Can we spill variables V s C V from V with overall 
weight EveVj w(v) < such that the induced code C has 
Maxlive CI' < r? 

Other instances The spill everywhere on a basic block denotes 
the case where T is a sequence B (linear code). The spill 
everywhere with few registers (k) denotes the case where r is 
fixed equal to k. The spill everywhere with many registers (k) 
denotes the case where r is equal to CI - k. The incremental spill 
everywhere denotes the case where r is equal to CI - 1 . 



and V the set of variables which appear in those instructions. 
An instruction first uses simultaneously some variables and then 
possibly defines some other new variables. Each variable of V 
is defined at most once and, if it is not defined, it is live-in for 
the sequence B. Also, each variable either has a "last use" (last 
instruction which uses it) or is live-out for the sequence. A variable 
is represented by a simple interval of the sequence B, starting at the 
middle of the instruction that defines it (or at the beginning of B for 
a live-in), and ending at the middle of its last use (or at the ending 
of B for a live-out). Spilling a variable v e V decreases by one 
the register pressure at each of its points but not at its definition 
and uses points: the set of points that is actually "removed" is 
the interval v with holes on it, so we call it a punched interval. 
The remaining points c e v which are not removed are called 
chads, as if, when spilling the variable v, one first had punched 
the corresponding interval, leaving small intervals in place. See 
Figure[T]for a graphical explanation. 

Simultaneous holes: Also, we distinguish different cases depend- 
ing on h, the number of simultaneous holes. This number corre- 
spond to the maximum number of registers which can be used (ar- 
guments) by the same instruction or defined by the same instruc- 
tion. For instance, h = 2 in the following three operand addi- 
tion add %regl, %reg2 => %reg3. Finally, for a given point p 
of B, the set of variables live at p is denoted by L{p). Its cardi- 
nal, the register pressure, is denoted by l{p) = \L(p)\ and Maxlive, 
the maximum of l(p) over all points p e B, is denoted by a){C). 
Once some variables Vs have been spilled, the induced code can 
be characterized as follows. The set of spilled variables live at p is 
Ls(p) — Vs |~) £(p); m e set of non-spilled live variables is L'(p) = 
L(p)\Ls(p). The new register pressure is denoted by l'(p). Notice 
that L'(p) does not contain any chad, whereas of course l'(p) needs 
to take remaining chads into account. Hence l'(p) is not necessarily 
equal to \L'(p)\ but, more generally, |L'(p)| < l'(p) < \L'(p)\ + h. 



As explained in 111 II . the hardness of load-store optimization 
comes from the fixed cost of the store (once a variable is chosen 
to be evicted) while the number of loads (number of times it is 
evicted) is not fixed. Neglecting the cost of the store would lead to 
a polynomial problem where each sub-intervals of the punched in- 
terval could be considered independently for spilling. But we feel 
that this approximation is not satisfactory in practice because the 
mean number of uses for each variable can be small. Indeed, we 
measured on our compiler tool-chain, using small kernels represen- 
tative of embedded applications, that most spilled variables have at 
most two uses. Hence, minimizing the number of spilled variables 
is nearly as important as minimizing the number of unsatisfied uses. 
Consider for example a furthest-first-like strategy on sub-intervals 
(see Figure Q] for an illustration of sub-intervals). To design such a 
heuristic, a spill everywhere solution might be considered to drive 
decisions: between several candidates that end the furthest, which 
one is the most suitable to be evicted in the future? Unfortunately, 
as summarized by Table [2] most instances of spill everywhere with 
holes are NP-complete for a basic block. 

We start with a result similar to Theorem[4] even with holes, the 
spill everywhere problem with few registers is polynomial. 

Theorem 7 (Dynamic programming on non-spilled variables). The 
spill everywhere problem with holes and few registers is polynomi- 
ally solvable even ifw + 1. 



All previous notions can be generalized to a general SSA pro- 
gram. The sequence B (linear code) becomes a tree T (dominance 
tree) and punched intervals become punched subtrees. Now, the 
(general) problem can be stated as follows. 



Proof: The proof is similar to the proof of Theorem [4] The only 
point is to adapt the notations to take chads into account. The 
word "removed" has to be replaced by "spill" since variables are 
not removed entirely. Furthermore, the definition of "fitting set" 
needs to be modified. A set F p of variables is a fitting set for p if, 
when all variables not in F p are spilled, the new register pressure 
l'(p) is at most r. In other words, the set of fitting sets becomes 
T p = \L'(p); l'(p) < A. Hence, it is "harder" for a set to be a fitting 
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Table 2. Spill on interval graphs with holes. 



set than for the problem without holes. Therefore, the number of 
fitting sets is smaller and is still at most L(p) k < Cl k . 

As in Theorem [4] the proof is an induction on points p of T 
(from the leaves to the root) and on fitting live sets F p e T p , 
Wmax(p, F P ) is built, for each F p e T p , thanks to dynamic program- 
ming, by "concatenating" some well chosen W max (f, Ff). Given a 
child / of p, we select a fitting set Ff e ff that matches F p , i.e., 
such that Ff n L(p) = F p n L(f), and that maximizes the cost 
°f W„ mx (p,F p ). We do this for each child of p, and because by 
construction they match on p, they can be expanded to a solution 
Wnaxip, F p ) that fits T p . The arguments are the same as for Theo- 
rem[4]and are not repeated here. □ 

We have seen that, without holes, the spill everywhere problem 
on an SSA program, with few registers, is polynomial whereas the 
instance with many registers (k) is NP-complete: the number of 
spilled variables live at a given point can be arbitrarily large (up 
to Ci). For a basic block, if h is fixed, this is not the case anymore. 
As we will see, this number is bounded by 2(h + k), leading to a 
dynamic programming algorithm with 0(\B\Cl 2{h+k> ) steps. 

Theorem 8 (Dynamic programming on spilled variables). The spill 
everywhere problem with holes and many registers can be solved 
in polynomial time, for a basic block, ifh is fixed even ifw + 1. 

Proof: The key point is to first prove that, for an optimal solution, 
for each point p, \Ls(p)\ < 2(h + k). Consider a point p such that 
\Ls(p)\ > h + k+l. We extend this point to a maximal interval / such 
that on any point p of this interval, \Ls(p)\ > h + k+l. We claim that 
there is no spilled variable v e Vs completely included in /. Indeed, 
otherwise, if v were restored (unspilled), then, at each point p of v, 
at least (h + k+l) - I = h + k variables would have been spilled, so 
the register pressure l'(p) < \L'(p)\ + h < (CI - (h + k)) + h = CI - k 
would still be small enough. This would contradict the optimality of 
the initial solution. Hence, no variable of V s is completely included 
in /: either it starts before the beginning of /, or it ends after the end 
of /. But / is of maximal size, hence on both extremities, there are 
at most h + k live spilled variables. This means that there is at most 
2(h + k) spilled variables live in any point of /. 

The rest of the proof is similar to the proofs of Theorems [4] 
and [7] The only difference is that spilled variables are considered 
instead of kept variables. For a point p, an extra live set E p is a set of 
variables of cardinal at most 2(h + k) and such that, if E p is spilled, 
the new register pressure l'(p) becomes lower than r. Let B p be the 
set of extra sets for p. It has at most L(p) 2{h+k) < Cl 2ih+k) elements. 

The proof is an induction on points p of B = {p s , . . . , p,„ ) and on 
extra live sets E p e £ p . Let B p . = [p\ , . . . , p,). A set of variables is 
said to fit B p if, for all points in B p , the register pressure obtained if 
all other variables are spilled is at most r. The induction hypothesis 
is that a solution W max (p, E p ) of maximum cost, that fits B p , and 
with Ls(p) = E p , can be built in polynomial time. Let p be a 
point of B and / its predecessor. Let E p e £,,, and an extra live 
set Ef that matches E p , i.e., such that Ef n L(p) = E p n L(f), 



and that maximizes the cost of W max (f,Ef). As noticed earlier, 
|fi/| < Cl 2{h+k) and it can be built, by induction hypothesis, in 
polynomial time. Because E p and Ef match, W max (f,Ef) can be 
expanded to a solution W max (p, E p ) that fits B p . The arguments are 
the same as those used for Theorems[4]and[7] 

The proof is constructive and provides an algorithm based on 
dynamic programming with 0(\B\Cl 2,ll+k) ) steps. □ 

The next two theorems show that the complexity does depend 
on h and k. If h is not fixed but k = 1, the incremental problem is 
NP-complete (Theorem[9j. If h is fixed but there is no constraints 
on r, most instances are NP-complete (Theorems 1 1 1 and 1 1 1 1 . 

Theorem 9 (From Minimum Cover). The incremental spill every- 
where with holes is NP-complete even if w(v) = 1 for each v e V 
and even on a basic block, ifh can be arbitrary. 

Proof: The proof is a straightforward reduction from Minimum 
Cover |12 Problem SP5], Let 'V be subsets of a finite set & and 
7C < IT 7 ! be a positive integer. Does *V contain a cover for S of 
size % or less, i.e., a subset "V c *V such that every element of S 
belongs to at least one member of 'V'l Punched intervals can be 
seen as subsets of B, they contain all points, except chads. 

Consider an instance of Minimum Cover. To each element of S 
corresponds a point of B. To each element v of "V corresponds a 
punched interval v that traverses entirely B and that only contains 
points corresponding to elements of v. In other words, there is a 
chad for each point not in v. At each point p of B, the number 
of punched intervals and chads that contain p (live variables) is 
exactly CI = \V\. A spilling that lowers by at least one the register 
pressure CI provides a cover of B and conversely. So, setting K = 7C 
and r = CI - 1 proves the theorem. □ 

Notice that the previous proof is very similar to the proof of 
Farach-Colton and Liberatore 1111 for Lemma 3.1. This lemma 
proves the NP-completeness of the load-store optimization prob- 
lem, which is harder than our spill everywhere problem. Still, their 
reduction is similar to ours since they used a trick to force the over- 
all load cost to be the same for all spilled variables, independently 
on the number of times a variable is evicted. Hence, the optimal 
solution to their load-store optimization problem just behaves like 
a spill everywhere solution. 

The main limitation of the reduction used for Theorem [9] is 
that the proof needs the number of simultaneous chads h to be 
arbitrary large, as large as |V|. This is of course not realistic for 
real architectures. In practice, usually h = 2 and even h = 1 for 
paging problems. Similarly to ours, the reduction of Farach-Colton 
and Liberatore use a large amount of simultaneous uses (in 1111 a 
read corresponds to a use and a corresponds to h). Theorem 3.2 
of 1111 extends their lemma to the case a = 1 but again, it deals 
with load-store optimization problem, which is harder than spill 
everywhere. Unfortunately, their trick cannot be applied to prove 
the NP-completeness of our "simpler" problem and we need to use 
a different reduction as shown below. 
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Figure 2. For each edge in E, a corresponding region in B. With yS large enough, spilling this region with r registers is equivalent to spilling 
the simplified region with r — 1 registers. 



Theorem 10 (At most 2 simultaneous chads). The spill everywhere 
problem with holes is NP -complete even if w(v) = 1 for all v € V, 
even with at most 2 simultaneous chads, and even on a basic block. 

Proof: The proof is a straightforward reduction from Independent 
Set C3 Problem GT20]. Let G = («V, E) be a graph and <K < |<V| 
be a positive integer. Does G contain an independent set (stable) 
*Vj of size 7C or more, i.e., a subset 'Vj c "V such that ll^s | > 9C 
and no two vertices in "Vg are joined by an edge (adjacent) in El 

Consider an instance of Independent Set. To each vertex v e "V 
of G corresponds a variable v e V which is live from the entry of B 
to its exit. To each edge (p., v) 6 E of G corresponds a point p(u, v) 
of B that contains a use of the corresponding variables u and v. 
In other words, there are two chads for each point of B. The key 
point is to notice that spilling K variables in Vs lowers the register 
pressure to |V| - K + 1 if and only if the corresponding set of 
vertices "Vs is an independent set. Indeed, if "Vj contains two 
adjacent vertices u and v, then at point p(u, v), the register pressure 
would be | V| - K+ 2. Hence, by letting K = <K and r = |V| - K+ 1, 
we get the desired reduction. Indeed, if there exist k < K variables 
that, when spilled, lead to a register pressure at most r = |V| - K+ 1 
then, first, k must be equal to K and, second, the corresponding 
vertices form an independent set of size K. Conversely, if there is 
an independent set of size at least K, then spilling the corresponding 
variables leads to a register pressure at most |V| - K + 1. □ 

Theorem 1 1 (No simultaneous chads). The spill everywhere prob- 
lem with holes is NP-complete even ifh= 1 and for a basic block. 

Proof: As for TheoremllOl the proof is a reduction from Indepen- 
dent Set. Consider an instance of Independent Set. To each vertex 
v € *V of G corresponds a variable v e V (called vertex variables), 
which is live from the entry of B to its exit. To each edge (p, v) 6 E 
of G corresponds a region in B where u and v are consecutively 
used. As depicted in Figure[2] such a region contains two additional 
overlapping local variables <5„ and S v (called 6 variables). For real 
codes, every live range must contain a chad at the beginning and a 
chad at the end. For our proof, we need to be able to remove the 
complete live range of a 6 variable, which is not possible because 
of the presence of chads for such variables. To avoid this problem, 
we increase the register pressure by 1 everywhere, except where 6 
variables have chads. See Figure [2] again: we add new variables f t 
such that the union of their live ranges covers exactly all points 
of B, except the points that correspond to the chad of a S variable. 
The cost B of spilling a variable f will be chosen large enough so 
that /; variables are never spilled in an optimal solution. So, from 



now on, without loss of generality, we consider the simplified ver- 
sion of the region (right hand side of Figure where S live ranges 
contain no chads. We let K = 7C and r = |*V| - K + 1. The cost for 
spilling a vertex variable is a while the cost for spilling a 6 variable 
is 1 . The suitable value for a will be determined later. 

The trick is to make sure that an optimal solution of our spilling 
problem spills exactly K vertex variables and at least \E\ of the 6 
variables (one per region). We do so by letting a = 2\E\ + 1 (in fact 
a = \E\ + 1 would be enough but we do so to simplify the proof). 
First, spilling K - 1 vertex variables in addition to all S variables is 
not enough: on the chad of one of the spilled variables, the register 
pressure will be lowered to |*V| - (K - 1 ) + 1 = |*V| - K + 2 > r. 
Second, spilling K vertex variables requires to spill at least one 6 
variable per region and spilling all 6 variables is enough. Hence, 
the minimum cost of a spilling with exactly K vertex variables is 
between Ka+E and Ka+2E. Finally, spilling K+ 1 vertex variables 
has a cost equal to (K + l)a = Ka + 2\E\ + 1. 

Now, it remains to show that the cost of an optimal spilling is 
Ka + E if and only if the spilled variables define an independent 
set for G. Consider an edge (u, v). All situations are depicted in 
Figure [3] If both u and v are spilled (in this case, "V is not a stable 
set), then both 5„ and S r must be spilled and the cost cannot be 
Ka + E. Otherwise, spilling either <5„ or 6„ is enough. □ 

5. Conclusion 

Recent results on the SSA form have opened promising directions 
for the design of register allocation heuristics, especially for dy- 
namic embedded compilation. Studying the complexity of the spill 
everywhere problem was important in this context. Unfortunately, 
our work shows that SSA does not simplify the spill problem like 
it does for the assignment (coloring) problem. Still, our results can 
provide insights for the design of aggressive register allocators that 
trade compile time for provably "optimal" results. Our study con- 
siders different singular variants of the spill everywhere problem. 

1. We distinguish the problem without or with holes depending on 
whether use operands of instructions can reside in memory slots 
or not. Live ranges are then contiguous or with chads. 

2. For the variant with chads, we study the influence of the number 
of simultaneous chads (maximum number of use operands of an 
instruction and maximum number of definition operands of an 
instruction). 

3. We distinguish the case of a basic block (linear sequence) and 
of a general SSA program (tree). 
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Figure 3. Different configurations whether u and v are spilled or not with r = YV\ — K + 1 registers. Non spilled variables are in bold. 



4. Our model uses a cost function for spilling a variable. We 
distinguish whether this cost function is uniform (unweighted) 
or arbitrary (weighted). 

5. Finally, in addition to the general case, we consider the singular 
case of spilling with few registers and the case of an incremental 
spilling that would lower the register pressure one by one. 

The classical furthest-first greedy algorithm is optimal only for the 
unweighted version without holes on a basic block. An ILP for- 
mulation can solve, in polynomial-time, the weighted version, but 
unfortunately, only for a basic block, not a general SSA program. 

The positive result of our study for architectures with few regis- 
ters is that the spill everywhere problem with a bounded number of 
registers is polynomial even with holes. Of course, the complexity 
is exponential in the number of registers, but for architectures like 
x86, it shows that algorithms based on dynamic programming can 
be considered in an aggressive compilation context. In particular, 
it is a possible alternative to commercial solvers required by ILP 
formulations of the same problem. For architectures with a large 
number of registers, we have studied the a priori symmetric prob- 
lem where one needs to decrease the register pressure by a constant 
number. Our hope was to design a heuristic that would incremen- 
tally lower one by one the register pressure to meet the number of 
registers. Unfortunately, this problem is NP-complete too. 

To conclude, our study shows that complexity also comes from 
the presence of chads. The problem of spill everywhere with chads 
is NP-complete even on a basic block. On the other hand, the in- 
cremental spilling problem is still polynomial on a basic block pro- 
vided that the number of simultaneous chads is bounded. Fortu- 
nately, this number is very low on most architectures. 
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